AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Neural Information Processing SystemsFeb-15-2026, 19:43:53 GMT

MAN TruckScenes: A multimodal dataset for autonomous trucking in diverse conditions

The scenes are tagged according to 34 distinct scene tags, and all objects are tracked throughout the scene to promote a wide range of applications.

artificial intelligence, machine learning, object-oriented architecture, (17 more...)

Country:

Europe > Switzerland (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceDec-8-2025

Two-Stage Camera Calibration Method for Multi-Camera Systems Using Scene Geometry

Abramov, Aleksandr

Calibration of multi-camera systems is a key task for accurate object tracking. However, it remains a challenging problem in real-world conditions, where traditional methods are not applicable due to the lack of accurate floor plans, physical access to place calibration patterns, or synchronized video streams. This paper presents a novel two-stage calibration method that overcomes these limitations. In the first stage, partial calibration of individual cameras is performed based on an operator's annotation of natural geometric primitives (parallel, perpendicular, and vertical lines, or line segments of equal length). This allows estimating key parameters (roll, pitch, focal length) and projecting the camera's Effective Field of View (EFOV) onto the horizontal plane in a base 3D coordinate system. In the second stage, precise system calibration is achieved through interactive manipulation of the projected EFOV polygons. The operator adjusts their position, scale, and rotation to align them with the floor plan or, in its absence, using virtual calibration elements projected onto all cameras in the system. This determines the remaining extrinsic parameters (camera position and yaw). Calibration requires only a static image from each camera, eliminating the need for physical access or synchronized video. The method is implemented as a practical web service. Comparative analysis and demonstration videos confirm the method's applicability, accuracy, and flexibility, enabling the deployment of precise multi-camera tracking systems in scenarios previously considered infeasible.

artificial intelligence, calibration, machine learning, (17 more...)

2512.05171

Genre: Research Report (0.40)

Industry: Media > Photography (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Keskar, Maitrayee, Trivedi, Mohan, Greer, Ross

MTR-VP: Towards End-to-End Trajectory Planning through Context-Driven Image Encoding and Multiple Trajectory Prediction

arXiv.org Artificial IntelligenceDec-1-2025

We present a method for trajectory planning for autonomous driving, learning image-based context embeddings that align with motion prediction frameworks and planning-based intention input. Within our method, a ViT encoder takes raw images and past kinematic state as input and is trained to produce context embeddings, drawing inspiration from those generated by the recent MTR (Motion Transformer) encoder, effectively substituting map-based features with learned visual representations. MTR provides a strong foundation for multimodal trajectory prediction by localizing agent intent and refining motion iteratively via motion query pairs; we name our approach MTR-VP (Motion Transformer for Vision-based Planning), and instead of the learnable intention queries used in the MTR decoder, we use cross attention on the intent and the context embeddings, which reflect a combination of information encoded from the driving scene and past vehicle states. We evaluate our methods on the Waymo End-to-End Driving Dataset, which requires predicting the agent's future 5-second trajectory in bird's-eye-view coordinates using prior camera images, agent pose history, and routing goals. We analyze our architecture using ablation studies, removing input images and multiple trajectory output. Our results suggest that transformer-based methods that are used to combine the visual features along with the kinetic features such as the past trajectory features are not effective at combining both modes to produce useful scene context embeddings, even when intention embeddings are augmented with foundation-model representations of scene context from CLIP and DINOv2, but that predicting a distribution over multiple futures instead of a single future trajectory boosts planning performance.

large language model, machine learning, trajectory, (22 more...)

2511.22181

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Ground > Road (0.51)
Information Technology (0.37)
Automobiles & Trucks (0.37)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.88)
(3 more...)

Neural Information Processing SystemsNov-21-2025, 14:47:56 GMT

Backprop KF: Learning Discriminative Deterministic State Estimators

backprop kf, learning discriminative deterministic state estimator, name change, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

arXiv.org Artificial IntelligenceOct-15-2025

mmWave Radar-Based Non-Line-of-Sight Pedestrian Localization at T-Junctions Utilizing Road Layout Extraction via Camera

Park, Byeonggyu, Kim, Hee-Yeun, Choi, Byonghyok, Cho, Hansang, Kim, Byungkwan, Lee, Soomok, Jeon, Mingu, Kim, Seong-Woo

Pedestrians Localization in Non-Line-of-Sight (NLoS) regions within urban environments poses a significant challenge for autonomous driving systems. While mmWave radar has demonstrated potential for detecting objects in such scenarios, the 2D radar point cloud (PCD) data is susceptible to distortions caused by multipath reflections, making accurate spatial inference difficult. Additionally, although camera images provide high-resolution visual information, they lack depth perception and cannot directly observe objects in NLoS regions. In this paper, we propose a novel framework that interprets radar PCD through road layout inferred from camera for localization of NLoS pedestrians. The proposed method leverages visual information from the camera to interpret 2D radar PCD, enabling spatial scene reconstruction. The effectiveness of the proposed approach is validated through experiments conducted using a radar-camera system mounted on a real vehicle. The localization performance is evaluated using a dataset collected in outdoor NLoS driving environments, demonstrating the practical applicability of the method.

artificial intelligence, machine learning, spatial reasoning, (16 more...)

2508.02348

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.47)

Neural Information Processing SystemsOct-10-2025, 05:54:06 GMT

71ac06f0f8450e7d49063c7bfb3257c2-Paper-Datasets_and_Benchmarks_Track.pdf

dataset, sensor, vehicle, (14 more...)

Country:

Europe > Switzerland (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.69)
(3 more...)

Sutter, Carly, Sulia, Kara J., Bassill, Nick P., Wirz, Christopher D., Thorncroft, Christopher D., Rothenberger, Jay C., Przybylo, Vanessa, Cains, Mariana G., Radford, Jacob, Evans, David Aaron

Road Surface Condition Detection with Machine Learning using New York State Department of Transportation Camera Images and Weather Forecast Data

arXiv.org Artificial IntelligenceOct-9-2025

The NYSDOT evaluates road conditions by driving on roads and observing live cameras, tasks which are labor-intensive but necessary for making critical operational decisions during winter weather events. However, machine learning models can provide additional support for the NYSDOT by automatically classifying current road conditions across the state. In this study, convolutional neural networks and random forests are trained on camera images and weather data to predict road surface conditions. Models are trained on a hand-labeled dataset of 22,000 camera images, each classified by human labelers into one of six road surface conditions: severe snow, snow, wet, dry, poor visibility, or obstructed. Model generalizability is prioritized to meet the operational needs of the NYSDOT decision makers, and the weather-related road surface condition model in this study achieves an accuracy of 81.5% on completely unseen cameras. Keywords Winter weather Co-design Artificial intelligence Risk communication Hand-labeled dataset Highlights Developed a model to classify road surface conditions using image and weather data Achieved accuracy of 81.5% on completely unseen cameras for weather-related classes Integrated co-design with end-users and interdisciplinary collaboration Designed methods that prioritize model generalizability for operational applicability

artificial intelligence, deep learning, machine learning, (17 more...)

2510.0644

Country: North America > United States > New York (0.65)

Genre: Research Report > New Finding (0.55)

Industry:

Transportation (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceOct-1-2025

BEV-VLM: Trajectory Planning via Unified BEV Abstraction

Chen, Guancheng, Yang, Sheng, Zhan, Tong, Wang, Jian

ABSTRACT This paper introduces BEV -VLM, a novel framework for trajectory planning in autonomous driving that leverages Vision-Language Models (VLMs) with Bird's-Eye View (BEV) feature maps as visual inputs. Unlike conventional approaches that rely solely on raw visual data such as camera images, our method utilizes highly compressed and informative BEV representations, which are generated by fusing multi-modal sensor data (e.g., camera and LiDAR) and aligning them with HD Maps. This unified BEV -HD Map format provides a geometrically consistent and rich scene description, enabling VLMs to perform accurate trajectory planning. Experimental results on the nuScenes dataset demonstrate 44.8% improvements in planning accuracy and complete collision avoidance. Our work highlights that VLMs can effectively interpret processed visual representations like BEV features, expanding their applicability beyond raw images in trajectory planning. Index T erms-- Autonomous Driving, Vision-Language Model, Multi-Modal Learning 1. INTRODUCTION In recent years, the pursuit of advanced autonomous driving (AD) has attracted extensive attention, with Vision-Language Models (VLMs) emerging as a promising pathway, owing to their inherent cognitive capabilities from pre-training that enable effective application in real-world scenarios. While existing research has demonstrated the feasibility and reliability of using VLMs for path planning by feeding visual camera images, these approaches suffer from two key limitations: they rely solely on camera data and thus lack integration with other modalities, such as LiDAR point clouds, and they fail to explore VLMs' potential for planning based on Bird's-Eye View (BEV) features. To address these gaps, this work avoids the direct use of raw visual signals (e.g., camera images) as VLM inputs.

artificial intelligence, bev feature, information, (14 more...)

2509.25249

Country: Asia > China (0.14)

Genre: Research Report (0.70)

Industry: Transportation (0.99)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.99)

Ochs, Sven, Schörner, Philip, Zofka, Marc René, Zöllner, J. Marius

Boosting LiDAR-Based Localization with Semantic Insight: Camera Projection versus Direct LiDAR Segmentation

arXiv.org Artificial IntelligenceSep-26-2025

Semantic segmentation of LiDAR data presents considerable challenges, particularly when dealing with diverse sensor types and configurations. However, incorporating semantic information can significantly enhance the accuracy and robustness of LiDAR-based localization techniques for autonomous mobile systems. We propose an approach that integrates semantic camera data with LiDAR segmentation to address this challenge. By projecting LiDAR points into the semantic segmentation space of the camera, our method enhances the precision and reliability of the LiDAR-based localization pipeline. For validation, we utilize the CoCar NextGen platform from the FZI Research Center for Information Technology, which offers diverse sensor modalities and configurations. The sensor setup of CoCar NextGen enables a thorough analysis of different sensor types. Our evaluation leverages the state-of-the-art Depth-Anything network for camera image segmentation and an adaptive segmentation network for LiDAR segmentation. To establish a reliable ground truth for LiDAR-based localization, we make us of a Global Navigation Satellite System (GNSS) solution with Real-Time Kinematic corrections (RTK). Additionally, we conduct an extensive 55 km drive through the city of Karlsruhe, Germany, covering a variety of environments, including urban areas, multi-lane roads, and rural highways. This multimodal approach paves the way for more reliable and precise autonomous navigation systems, particularly in complex real-world environments.

artificial intelligence, machine learning, segmentation, (15 more...)

2509.20486

Country: Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.25)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.48)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)